Enhanced Web Mining Technique To Clean Web Log File

نویسنده

  • Rachit Goel
چکیده

The arrival of the computer technology has contributed the ability to produce and store the massive amounts of data. Now the world is not confined only to manually generated files or reports, but has become a giant store where vast amounts of data are collected and exchanged daily. Web pages typically contain a large amount of information that is not part of the main content of the pages, e. g. banner ads, navigation bars, copyright notices, etc. Such noise on web pages usually leads to poor results in Web Mining which mainly depends upon the web page content. Therefore, it becomes very essential to extract information from the bulks of data and structure them into useful knowledge that will be helpful for some type of understanding. This leads to the birth of data mining. Web usage mining is the subject field of Data Mining which deals with the discovery and analysis of usage patterns from web data specifically web logs in order to improve the web based applications. The motive of mining is to find users' access models automatically and quickly from the vast Web log data, such as frequent access paths, frequent access page groups and user clustering. Through web usage mining, the server log, registration information and other relative information left by user provide foundation for decision making of organizations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Technique for Improving Web Mining using Enhanced Genetic Algorithm

World Wide Web is growing at a very fast pace and makes a lot of information available to the public. Search engines used conventional methods to retrieve information on the Web; however, the search results of these engines are still able to be refined and their accuracy is not high enough. One of the methods for web mining is evolutionary algorithms which search according to the user interests...

متن کامل

Web Navigation Path Pattern Prediction using First Order Markov Model and Depth first Evaluation

Web usage mining has been defined as a technique of finding hidden knowledge from a log file. The interaction between website and user is recorded in the related web server log file. Web designer is able to analyze the file in order to understand the interaction between users and a web site, which helps to improve web topology. All information of web usage can be generated from log files and it...

متن کامل

User Navigation Pattern Discovery using Fast Adaptive Neuro-Fuzzy Inference System

World Wide Web is a huge repository of web pages and links. It provides abundance information for the Internet users. The growth of web is incredible as it can be seen in present days. Users’ accesses are recorded in web logs. From the user’s perspective, it is very difficult to extract useful knowledge from the huge amount of information and secondly, it is also difficult to extract for the us...

متن کامل

An Efficient Preprocessing Methodology of Log File for Web Usage Mining

Now a day, WWW has become important and huge data storage. All users' activities will be stored in log file. The log file shows the interest on the particular website. With a wide usage of internet, the log file size is growing rapidly. Web mining is the process of extracting information from web data. The raw log file won't reveal the users' accessing pattern. Thus, preprocessin...

متن کامل

Null Value Estimation in Web Environment by using Fuzzy Rule based K-Mean Clustering

In the Web environment web log file capture operational data generated through internet for analysing user’s browsing behaviour and many other security issues. The captured operational data is useful for build use profile, web designing and acts as evidence in web forensic and many other security issues. In real world there are lots systems that participate in web environment having incomplete ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014